NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Finite Time Guarantees for Continuous State MDPs with Generative Model

https://doi.org/10.1109/CDC42340.2020.9303840

Sharma, Hiteshi; Jain, Rahul (December 2020, 2020 59th IEEE Conference on Decision and Control (CDC))
null (Ed.)
Full Text Available
An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions

https://doi.org/10.1109/ALLERTON.2019.8919719

Sharma, Hiteshi; Jain, Rahul (September 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton))

It has long been a challenging problem to design algorithms for Markov decision processes (MDPs) with continuous states and actions that are provably approximately optimal and can provide arbitrarily good approximation for any MDP. In this paper, we propose an empirical value learning algorithm for average MDPs with continuous states and actions that combines empirical value iteration with n function-parametric approximation and approximation of transition probability distribution with kernel density estimation. We view each iteration as operation of random operator and argue convergence using the probabilistic contraction analysis method that the authors (along with others) have recently developed.
more » « less
Full Text Available
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes.

Wei, Chen-Yu; Jafarnia-Jahromi, Mehdi; Luo, Haipeng; Sharma, Hiteshi; Jain, Rahul (July 2020, International Conference on Machine Learning)
null (Ed.)
Full Text Available
Empirical Algorithms for General Stochastic Systems with Continuous States and Actions

https://doi.org/10.1109/CDC40024.2019.9029308

Sharma, Hiteshi; Jain, Rahul; Haskell, William (December 2019, Proc. IEEE Control and Decision Conference)

Full Text Available
A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs

https://doi.org/10.1109/TAC.2019.2907414

Haskell, William B.; Jain, Rahul; Sharma, Hiteshi; Yu, Pengqian (January 2020, IEEE Transactions on Automatic Control)
null (Ed.)
Full Text Available
An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space

https://doi.org/10.23919/ECC.2019.8795982

Sharma, Hiteshi; Jain, Rahul; Gupta, Abhishek (June 2019, 2019 18th European Control Conference (ECC))

We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs with continuous state space and finite actions and average reward criterion. The ERVL algorithm relies on function approximation via nearest neighbors, and minibatch samples for value function update. It is universal (will work for any MDP), computationally quite simple and yet provides arbitrarily good approximation with high probability in finite time. This is the first such algorithm for non-parametric (and continuous state space) MDPs with average reward criteria with these provable properties as far as we know. Numerical evaluation on a benchmark problem of optimal replacement suggests good performance.
more » « less
Full Text Available
Approximate Relative Value Learning for Average-reward Continuous State MDPs

Sharma, Hiteshi; Jafarnia-Jahromi, Mehdi; Jain, Rahul (July 2019, Proceedings UAI)

In this paper, we propose an approximate rela- tive value learning (ARVL) algorithm for non- parametric MDPs with continuous state space and finite actions and average reward criterion. It is a sampling based algorithm combined with kernel density estimation and function approx- imation via nearest neighbors. The theoreti- cal analysis is done via a random contraction operator framework and stochastic dominance argument. This is the first such algorithm for continuous state space MDPs with average re- ward criteria with these provable properties which does not require any discretization of state space as far as we know. We then eval- uate the proposed algorithm on a benchmark problem numerically.
more » « less
Full Text Available

Search for: All records